Cert) \%ca be o V EW^ v Mi ; / 

^^^HlQ^i US 



1/12 



100 



105 



Sequence 
seq_id : S01 




110 



Replet 



Backbone 



115 



635728US 



FIG. 1 

2003,09.15 



(l:\ELEC\IBMUP920030l52US11jp9200301 52us1 .figs.final.dociedg 



2/12 



200 



205 



210 



245 



Input 
Sequences 
(9?) 



< 




User view 
description 
Rules 



Pre process 9? 



Repiet- 
sequence 
matrix 



Variation 
Tables 



215 






Discover Patterns and 
the set 










Represent 9? using ps' 
in set <D" 





220 



230 




255 



Sequence 
Backbones 



250 



FIG. 2 



63572eUS 



2003.09.15 



[l:\ELEC\tBMVJP9200301 52US1 ljp9200301 52us1 .figs.final .doc:edg 



3/12 




635728US 



FIG. 3 



2003.09.1 5 ll:\ELEC\IBM\JP92QQ301 52US 1 ljp9200301 52us1 .figs.final .doc:edg 



4/12 



400 




cgcgcgcgcg f 
aaataa..aaa ^ 



acagg-.ta.gcc,. 



tactata ttac 



Seq 1 I I Seq 2 I | Seq 3 | | Seq 4 




Level 1 edge 
Pattern connector 
Level 2 edge 
Level 3 edge 



635728US 



FIG. 4 



2003.09.15 



(l:\ELEC\IBMUP9200301 S2US1 ijp9200301 52us1 .figs.final .doc:edg 



5/12 



500 




cgcgcgcgcg 



aa..d...a 



aaataa..aaa ♦ 



acagg..ta.gcc..ci 



I tactata ttac 4 



Seg2 I I Seq 3 I I Seq 4 | I Seq 5 I 




Level 1 edge 
Pattern connector 
Level 2 edge 
Level 3 edge 
Base-replet connector 



635726US 



FIG. 5 



2003.09.15 



ll:\ELEC\IBMVJP9200301 52US11jp9200301 52us1 .figs.final.dociedg 



6/12 



600 




cqcgcgcgcg 



[ 



aa..a...a 



I aaataa,.aaa 
I acagg..ta.gcc..c^ 

I tactata ttac 4 

I aetata 

— ► Level 1 edge 

Pattern connector 
Level 2 edge 
Level 3 edge 
Base-replet connector 




635728US 



FIG. 6 



2003.09.15 



[l:\ELECMBM\JP920030152US1ljp920030152us1.figs.final .doc:edg 



7/12 



Algorithm reconstruct (sequence-id seq_id) 
Begin 

Backbone = getBackbone(seq_id); 
/* getBackbone(secLid) searches the backbone list and returns the backbone 
corresponding to seq_id*/ 

Match-Set mr = getheadof(secLid); /* returns the first match-set instance of the sequence 

seqJdV 

String seq=""; 
offset=0; 
Hashtable ht = 0; 
loopcnt=0 ;bp tr=0 ; 

While(nir!=null){ /* 'null' represents the end of traversal^ 

roffset = getOffset(nir, loopcnt);/* returns the loopcni^ offset (k^S^ of the 
instance mr^l 
if((roffset - poffset)>0){ 

seq=concat(seq, substring(backbone, bptr, roffset-poffset)); 
bptr=bptrH-roffset-poffset; 

^poffset = roffset +length( getreplet(nir) ); l*getreplet(mr) returns the replet in 
mr^l 

seq = concat(seq, resolve(getreplet(mr), getVarInfo(nir,roffset))); 
l*getVarInfor(mr, roffset) provides the variation information for the replete in mr 
at the roffset"^ I 

/* resolvefreplet, var-info) generates the subsequence represented by 

replete+var-info */ 

add(mr,ht); /* increments the occurrence count of replete in mr when traversing 
the sequence"^ I 

loopcnt = no-of-occurance(mr, ht); 

l'^no-of-occurance(mr, ht) returns the number of times the replete in mr has 
occurred upto this point of traversal"^ I 
mr = getnextbasematchset(mr, loopcnt -1); 

/* getnextbasematchset(mr, cnt) provides the next occurring base replets match 
set Instance, this corresponds to the ' cnt pointer in the current mr^l 
loopcnt — no-of-occurance (mr, ht); 

seq = concat(seq, substring(backbone, bptr, length(backbone)-l); 
retum seq; 

End 
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Backbone = bseq 3: acttgatcggtagctagacggagaagctcccaaaac 

Base replets occurring in 3 are {cgcgcgcgcg, aaataa..aaa, acagg..ta.gcc..c, tactata ttac} 

Match-set of the base replets are provided below 
1: cgcgcgcgcg 

{ 

Sequence-id = 3 
Pattern-id = 1 

Array of Matching-offsets <K,5> = {18,39,83} 
Array of Is-base-replet = {true, true, true} 
Array of Pointer to Base-replet = {null, null, null} 
Array of sequence-formation-edges = {2, 3, 4} 

Pointer to next-pattern instance ={...}, Pointer to previous-pattern instance ={...} 
} 

2: aaataa..aaa 

{ 

Sequence-id = 3 
Pattern-id = 2 

Array of Matching-offsets <K,5> = {28} 
Array of Is-base-replet = {true} 
Array of Pointer to Base-replet = {null} 
Array of sequence- formation-edges = {1} 

Pointer to next-pattem instance ={...}, Pointer to previous-pattem instance ={...} 
} 

3: acagg..ta.gcc.*c 

{ 

Sequence-id = 3 
Pattem-id = 3 

Array of Matching-offsets <K,5> = {49} 
Array of Is-base-replet = {true} 
Array of Pointer to Base-replet = {null} 
Array of sequence-formation-edges = {1} 

Pointer to next-pattem instance ={...}, Pointer to previous-pattem instance ={...} 
} 

4: tactata ttac 

{ 

Sequence-id = 3 
Pattem-id = 4 

Array of Matching-offsets <K,5> = {93} 
Array of Is-base-replet = {true} 
Array of Pointer to Base-replet = {null} 
Array of sequence- formation-edges = {null} 

Pointer to next-pattem instance ={...}, Pointer to previous-pattem instance ={...} 
} 
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Start of first while loop 

Bptr=0;seq="";ofrset=0;loopcnt=0;ht= {} ;mr=l 
Inside the loop 

Roffset= 18; 
Condition true -> Inside 'if 

Seq = acttgatcggtagctaga 

Bptr= 18 
Outside 'if 

poffset = 28 

seq= acttgatcggtagctagacgcgcgcgcg 

ht={<l,l>} 
loopcnt=l 
mr=2 
loopcnt=0 

Start of second loop as mr!=null 

Roffset = 28 
Condition false 

Poffset=39 

Seq=acttgatcggtagctagacgcgcgcgcgaaataattaaa 

ht={<l,l>,<2,l>} 

loopcnt=l 

nir=l 

loopcnt=l 

Start of third loop as nir!=null 

Roffset =39 
Condition false 

Poffset= 49 

Seq=acttgatcggtagctagacgcgcgcgcgaaataattaaacgcgcgcgcg 

ht={<l,2>,<2,l>} 

loopcnt=2 

nir=3 

loopcnt=0 

Start of fourth loop as mr!=null 

Roffset = 49 
Condition false 

Poffset=65 

Seq= acttgatcggtagctagacgcgcgcgcgaaataattaaacgcgcgcgcgacaggtataggccaac 

ht={<l,2>,<2,l>,<3,l>} 

loopcnt=l 

mr=l 

loopcnt=2 

FIG. 9B 
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Start of fifth loop as mr!=null 

Roffset = 83 
Condition true -> Inside 'if 

Seq= 

acttgatcggtagctagacgcgcgcgcgaaataattaaacgcgcgcgcgacaggtataggccaaccggagaagctcccaaaac 

Bptr=36 
Outside *if 

Poffset=93 
Seq= 

acttgatcggtagctagacgcgcgcgcgaaataattaaacgcgcgcgcgacaggtataggccaaccggagaagctcccaaaaccgcgc 
gcgcg 

ht={<l,3>,<2,l>,<3,l>} 

loopcnt=3 

mr=4 

loopcnt=0 

Start of sixth loop as nir!=null 

Roffset =93 
Condition false 

Poffset=93 

Seq= 

acttgatcggtagctagacgcgcgcgcgaaataattaaacgcgcgcgcgacaggtataggccaaccggagaagctcccaaaaccgcgc 

gcgcgtactatatcatattac 

ht={<l,3>,<2,l>,<3,l>,<4,l>} 

loopcnt=l 

nir=null 

loopcnt— 1 

The while loop is terminated as mr = null; 
Outside while loop 

There is no more subsequence of the backbone to be added to 'Seq' 

Return seq 

Output = 

"acttgatcggtagctagacgcgcgcgcgaaataattaaacgcgcgcgcgacaggtataggccaaccggagaagctcccaaaaccgcg 
cgcgcgtactatatcatattac" 
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